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1 Information extraction: Coreference resolution strategies from an application Q 
perspective 

Lois C. Chllds, David Dadd, Norris Heintzelman 

October 1998 Proceedings of a workshop on held at Baltimore, Maryland: October IB- 
IS, 1998 

Publisher: Association for Computational Linguistics 

Full text available: 'g!) pdf(519.19 KB) Additional Information: full citation , abstract 

As part of our TIPSTER III research program, we have continued our research into 
strategies to resolve coreferences within a free text document; this research was begun 
during our TIPSTER II research program. In the TIPSTER II Proceedings paper, "An 
Evaluation of Coreference Resolution Strategies for Acquiring Associated Information/' the 
goal was to evaluate the contributions of various techniques for associating an entity with 
three types of information: 1) name variations, 2) descriptive phra ... 
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XML query processing I: Composing XSL transformations with XML publishing views 
Chengkai Li, Philip Bohannon, P. P. S. Narayan 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

r- .. * ^ u. 01 j*/noc cc lyos Additional Information: full citation , abstract , references , citings, index 
Full text available: ^ pdf(225.65 KB) 

While the XML Stylesheet Language for Transformations (XSLT) was not designed as a 
query language, it is well-suited for many query-like operations on XML documents 
including selecting and restructuring data. Further, it actively fulfills the role of an XML 
query language in modern applications and is widely supported by application platform 
software. However, the use of database techniques to optimize and execute XSLT has 
only recently received atten ... 



Adaptive information extraction 

Jordi Turmo, Alicia Ageno, Neus Catala 

July 2006 ACM Computing Surveys (CSUR), Volume 38 issue 2 
Publisher: ACM Press 

Full text available: "^ pdfOSe.SS KB) Additional Information: full citation , abstract , references , index terms 

The growing availability of online textual sources and the potential number of applications 
of knowledge acquisition from textual data has lead to an increase in Information 
Extraction (IE) research. Some examples of these applications are the generation of data 



http://portaLacm.org/results.cfm?coll=ACM&dl=ACM&CFro=8220300&CFTOKEN=3475 1/3/07 



Results (page 1): document template resolution 



Page 2 of 6 



bases from documents, as well as the acquisition of knowledge useful for emerging 
technologies like question answering, information integration, and others related to text 
mining. However, one of the main drawbacks of the application of ... 

Keywords: Information extraction, machine learning 



Deliverina a large information database 
Christina L Klein 

February 1996 Proceedings of the 13th annual international conference on Systems 
documentation: emerging from chaos: solutions for the growing 
complexity of our jobs 

Publisher: ACM Press 

Full text available: ^pdfd.OI MB) Additional Information: full citation , index terms 



Active rules for XML: A new paradigm for E-services 
Angela Bonlfati, Stefano Ceri, Stefano Paraboschi 

August 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, volume 10 Issue 1 

Publisher: Springer-Verlag New York, Inc. 

Full text available: '^ pdftSI.SS KB) Additional Information: full citation , abstract , citings , index terms 

XML is rapidly becoming one of the most widely adopted technologies for information 
exchange and representation. As the use of XML becomes more widespread, we foresee 
the development of active XML rules, i.e., rules explicitly designed for the management of 
XML information. In particular, we argue that active rules for XML offer a natural 
paradigm for the rapid development of innovative e-services. In the paper, we show how 
active rules can be specified in the context of XSLT, a pattern-based la ... . 

Keywords: Active databases. Document management. Query languages for XML, XML, 
XSLT 



6 Document managennent: A ground-truthing engine for proofsettina. publishing, re- 

^ purposing and quality assurance 
Steven J. Simske, Margaret Sturgill 

November 2003 Proceedings of the 2003 ACM symposium on Document engineering 
Publisher: ACM Press 

Full text available: "Q pcifd 65.66 KB) Additional Information: full citation, abstract , references , index terms 

We present design strategies, implementation preferences and throughput results 
obtained in deploying a Ul-based ground truthing engine as the last step in the quality 
assurance (QA) for the conversion of a large out-of-print book collection into digital form. 
A series of automated QA steps were first performed on the document. Five distinct 
zoning analysis options were deployed and the PDF output thence generated was used to 
regenerate TIFF files for comparison to the originals. Regenerated TIF ... 

Keywords: layout, print-on-demand, region management, templates 



7 System descriptions: MITRE-Bedford: description of the ALEMBIC system as used g 
for MUC-4 

John Aberdeen, John Burger, Dennis Connolly, Susan Roberts, Marc Vilain 

June 1992 Proceedings of the 4th conference on Message understanding MUC4 '92 
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Publisher: Association for Computational Linguistics 

Full text available: odff 566.09 KB) Additional Information: full citation , abstract , references 

The ALEMBIC text understanding system fielded at MUC-4 by MITRE-Bedford Is primarily 
based on natural language techniques. ALEMBIC is a research prototype that is intended 
to explore several major areas of investigation:* Error recovery, involving primarily issues 
of semi-parsing and recovery of plausible attachments.* Robustness, involving primarily 
issues of uncertain reasoning and tractable inference.* Self-extensibility, focusing 
primarily on machine learning of natural langua ... 

8 Project notes and demos: The week at a glance: cross-language cross-document Q 
information extraction and translation 

Jim Cowie, Yevgeny Ludovik, Hugo Molina-Salgado, Sergei Nirenburg 

July 2000 Proceedings of the 18th conference on Computational linguistics - Volume 
2 

Publisher: Association for Computational Linguistics 

Full text available: 'g pdf(302.89 KB) Additional Information: full citation , abstract, references 

Work on the production of texts in English describing instances of a particular event type 
from multiple news sources will be described. A system has been developed which 
extracts events, such as meetings, from texts in English, Russian, Spanish, and Japanese. 
The extraction is currently carried out using only ontological information. The results of a 
set of such extractions were combined to produce a table of event instances, date 
stamped, with links back to the original documents. The original ... 

9 Information extraction: Tasks, domains, and languages for information extraction B 
Boyan Onyshkevych, Mary Ellen Okurowski, Lynn Carlson 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 

September 19-23, 1993 
Publisher: Association for Computational Linguistics 
Full text available: "g) pdf(709.08 KB) Additional Information: full citation , abstract 

The information extraction tasks for the ARPA TIPSTER program center on automatically 
filling object-oriented data structures, called templates, with information extracted from 
free text in news stories (for discussion of templates and objects, see "Template Design 
for Information Extraction" In this volume). With text as input, the TIPSTER systems first 
detect whether the text contains relevant information. If so, the systems extract specific 
instances of generic types of information tha ... 

10 Optimizing document format: Compression of scan-diaitized Indian language printed H 

^ text: a soft pattern matching technigue 

^ U. Garain, S. Debnath, A. Mandal, B. B. Chaudhuri 

November 2003 Proceedings of the 2003 ACM symposium on Document engineering 

Publisher: ACM Press 

Full text available: pdf(272.00 KB) Additional Information: full citation , abstract , references . Index terms 

In this paper, a new compression scheme is presented for Indian Language (IL) textual 
document images. Since OCR technology for IL scripts is not matured enough, 
transcription of these documents into digital domain needs new techniques that achieve 
high degree of compression as well as suitable methods to perform various operations like 
document indexing, retrieval, etc. The proposed method is essentially based on symbolic 
compression technique, which has been realized with an efficient segmenta ... 

Keywords: data compression, Indian language, pattern matching, textual image 
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Ralph Weischedel, Damaris Ayuso, Sean Boisen, Heidi Fox, Tomoyoslii l^atsul<awa, 
Constantine Papageorgiou, Dawn MacLaughlin, Masaichiro Kitawa, Tsutomu Sakai, June Abe, 
HIroto Hosihi, Yoichi Miyamoto, Scott {filler 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 
September 19-23, 1993 

Publisher: Association for Computational Linguistics 

Full text available: ' ^Ddf(1.13 MB) Additional Information: full citation , abstract , references 

Traditional approaches to the problem of extracting data from texts have emphasized 
hand-crafted linguistic knowledge. In contrast, BBN's PLUM system (Probabilistic 
Language Understanding Model) was developed as part of an ARPA-funded research 
effort on integrating probabilistic language models with more traditional linguistic 
techniques. Our research and development goals are:* Achieving high performance in 
objective evaluations, such as the ... 

12 Information extraction task: Tasks, domains, and languages 
Boyan Onyshkevych, Mary Ellen Okurowski, Lynn Carlson 

August 1993 Proceedings of the 5th conference on Message understanding MUC5 '93 
Publisher: Association for Computational Linguistics 
* Full text available: ' ^pdf(686.13 KB) Additional Information: full citation , abstract 

The Fifth Message Understanding Conference (MUC-5) involved the same tasks, domains 
and languages as the information extraction portion of the ARPA TIPSTER program. These 
tasks center on automatically filling object-oriented data structures, called templates, with 
information extracted from free text in news stories (for discussion of templates and 
objects, see "Template Design for Information Extraction" in this volume). For each task, 
a generic type of information that is specified for ... 

13 Systems: The NYU system for MUC-6 or where's the syntax? 
Ralph Grishman 

November 1993 Proceedings of the 6th conference on Message understanding MUC6 
'95 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(619.38 KB) Additional Information: full citation , abstract , references 

Over the past five MUCs, New York University has clung faithfully to the idea that 
information extraction should begin with a phase of full syntactic analysis, followed by a 
semantic analysis of the syntactic structure. Because we have a good, broad -coverage 
English grammar and a moderately effective method for recovering from parse failures, 
this approach held us in fairly good stead. 

14 Making use of document standards and models: A framework for structure, layout & 

function in documents 
John Lumley, Roger Gimson, Owen Rees 

November 2005 Proceedings of the 2005 ACM symposium on Document engineering 
DocEng '05 

Publisher: ACM Press 

Full text available: '^ pdfd.SS MB) Additional Information: full citation , abstract , references , index terms 

The Document Description Framework (DDF) is a representation for variable-data 
documents. It supports very high flexibility in the type and extent of variation supported, 
considerably beyond the 'copy-hole' or flow-based mechanisms of existing formats and 
tools. DDF is based on holding application data, logical data struc-ture and presentation 
as well as constructional 'programs' together within a single document. DDF documents 
can be merged with other documents, bound to variable values increme ... 

Keywords: SVG, XML, XSLT, document construction, functional programming 
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15 Using GATE as an environment for teaching NLP 

Kalina Bontcheva, Hamish Cunningham, Valentin Tablan, Diana Maynard, Oana Hamza 

July 2002 Proceedings of the ACL-02 Workshop on Effective tools and methodologies 

for teaching natural language processing and computational linguistics - 

Volume 1 

Publisher: Association for Computational Linguistics 

Full text available: ^Ddf(439.53 KB) Additional Information: full citation , abstract , references 

In this paper we argue that the GATE architecture and visual development environment 
can be used as an effective tool for teaching language engineering and computational 
linguistics. Since GATE comes with a custbmisable and extendable set of components, it 
allows students to get hands-on experience with building NLP applications. GATE also has 
tools for corpus annotation and performance evaluation, so students can go through the 
entire application development process within its graphical develop ... 

16 Design of communication: Solutions documentation 
Vanadis Crawford, Angela Pitts, Rosalind Radcllffe, Leah Ann Seifert 

October 2004 Proceedings of the 22nd annual international conference on Design of 

communication: The engineering of quality documentation 
Publisher: ACM Press 

Full text available: "g^ pdf(221.04 KB) Additional Information: full citation , abstract , references. Index terms 

In today's software environment, more and more products must.be installed and 
configured in concert with one another. Unfortunately, most software Is developed 
product-by-product and the approach to information development is in alignment with the 
individual development projects. In the end, a user may have to have as many as 20 
publications open and 7 help systems up to understand how to implement the overall 
solution for his or her installation. This paper will discuss the need for cross-pr ... 

Keywords: documentation, process, software engineering, solutions 



17 XML transactions: An object-oriented extension of XML for autonomous web 
applications 

Hasan M. Jamil, Giovanni A. Modica 

November 2002 Proceedings of the eleventh international conference on Information 

and knowledge management 
Publisher: ACM Press 

Full text available: 'g ^Ddf(277.52 KB) Additional Information: full citation , abstract , references , index terms 

While the idea of extending XML to include object-oriented features has been gaining 
popularity in general, the potential of inheritance In document design has not been well 
recognized in contemporary research. In this paper we demonstrate that XML with 
dynamic inheritance aids better document designs and decreased management overheads 
and support increased autonomy. As an extended application, we point out that dynamic 
inheritance also helps effective automated web portal and ontology designs. W ... 

Keywords: XML, autonomous objects, document structuring, dynamic object hierarchy, 
inheritance, object-orientation, web 



18 Short papers session 1 : Automatic document orientation detection and categorization 

^ through document vectorization 
Shijian Lu, Chew Lim Tan 

October 2006 Proceedings of the 14th annual ACM international conference on 
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Multimedia MULTIMEDIA '06 

Publisher: ACM Press 

Full text available: "g ?! pdf(557.98 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents an automatic orientation detection and categorization technique that 
is capable of detecting the orientation of multilingual documents with arbitrary skew and 
categorizing document images according to the underlying languages. We carry out 
orientation detection and categorization through document vectorization, which encodes 
document orientation and language information and converts each document image into 
an electronic document vector through the exploitation of the density ... 

Keywords: document image, document orientation detection 



19 An issue-oriented approach to judicial document assembly 
^ L. Karl Branting 

>^ August 1993 Proceedings of the 4th international conference on Artificial intelligence 
and law 
Publisher: ACM Press 

Full text available: ^pdf(749.95 KB) Additional Information: full citation , references , citings , index terms 



20 Sources of Success for Boosted Wrapper Induction B 
David Kauchak, Joseph Smarr, Charles Elkan 

December 2004 The Journal of Machine Learning Research, Volume 5 
Publisher: MIT Press 

Full text available: ^ pdf(281.46 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we examine an important recent rule-based information extraction (IE) 

technique named Boosted Wrapper Induction (BWI) by conducting experiments on a 
wider variety of tasks than previously studied, including tasks using several collections of 
natural text documents. We investigate systematically how each algorithmic component of 
BWI, in particular boosting, contributes to its success. We show that the benefit of 
boosting arises from the ability to reweight examples to learn specifi ... 
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